Authors: Andreas Novotny (1,2), Caterina Rodrigues (1,2), Loïc Jacquemot (1,2), Rute B. G. Clemente-Carvalho (2), Rebecca S. Piercey (2), Evan Morien (2), Moira Galbraith (3), Colleen T. E. Kellogg (2), Matthew A. Lemay (2), and Brian P. V. Hunt (1,4)
Institute for the Oceans and Fisheries, University of British Columbia, 2202 Main Mall, Vancouver, BC, V6T 1Z4 Canada
Hakai Institute, PO Box 25039, Campbell River, BC, V9W 0B7, Canada
Institute of Ocean Sciences, Fisheries and Oceans Canada, PO Box 6000, Sidney, BC, V8L 4B2, Canada
Department of Earth, Ocean and Atmospheric Sciences, University of British Columbia, 2202 Main Mall, Vancouver, BC, V6T 1Z4 Canada
In this study, we examined how well DNA metabarcoding capture changes in biomass of marine mesozooplankton. We evaluated the effectiveness of three different standardisation methods – relative abundance, presence-absence, and the eDNA-index – for correlating seasonal trends between DNA sequence read abundance and microscopic biomass estimates of zooplankton net samples collected every two weeks in a 1-year time series. To assess if taxa specific trends are sensitive to the selection of genetic marker, we sequenced both short fragment of the COI gene, with primers optimised to capture marine invertebrate community (Leray et al., 2013) and the 18S gene, with universal primers optimised for marine microbial eukaryotes (Balzano et al., 2015). In doing so, we provide both new insights into the reliability of DNA metabarcoding analysis and guidelines for future studies that aim to use environmental DNA to study marine community dynamics. Further, through analysis of discreet water column DNA samples, we demonstrate the potential of the DNA approach in resolving differences in vertical zooplankton distributions.
This R notebook was used to produce the graphical and
statistical output for the manuscript and its supplementary
figures and tables. See manuscript for details on sampling, experimental
design and sample processing.
Prior ro analysis in this script 18S and COI amplicons
have been sequenced on a MiSeq and raw Fastq sequences have been
uploaded to NCBI under the BioProject accession number PRJNA1141475 (https://www.ncbi.nlm.nih.gov/sra/PRJNA1141475). All
sample-specific accession numbers are listed in the metadata files. See
further down.
DNA sequences (FASTQ files) were processed using a custom bioinformatic
pipeline (see Supplementary Data 2: https://doi.org/10.5281/zenodo.14241795) using DADA2,
with sequences annotated to the MetaZooGene database. ASV tables with
taxonomic annotations are found in the ./Data folder.
These R packages are required for executing the code below:
# Data import
library(readxl)
# All data format modifications and filtering and plotting
library(tidyverse)
'%!in%' <- function(x,y)!('%in%'(x,y))
# This script contains a function for calculating eDNA-index.
source("Code/Functions.R")
# Specific plotting tools
library(UpSetR) # For making Euler diagrams
library(eulerr) # For making Euler diagrams
library(ggOceanMaps) # For maps
library(ggspatial) # For map modifications
library(cowplot) # Simple plot themes
library(ggpubr) # Plot layout
library(patchwork) # Plot layout
library(ggnewscale) # Multiple scales on one plot
# Statistical analyses
library(vegan)
# Import all datasets from Bioinformatic Pipeline
#system2("cp", args = c(
# "../CoastalBCdata/Data_output/To_ZoopMethodsComp/*",
# "Data/"
#))
Samples were collected during 2017 from station QU39 in the Salish Sea, British Columbia, Canada.
# 1. Draw inner map with points for stations
# Make a dataframe cointatining staion coordinates
pin <- data.frame(lon = c(-125.0992),
lat = c(50.0307),
label = c("QU39"))
# Plot inner map
inner <-
# Define the limits (zoom of the map)
basemap(limits = c(-126.5, -121.8, 48.3, 51.3), rotate = TRUE) +
# Add red point for sampling station(s)
geom_spatial_point(data = pin, aes(x = lon, y = lat), color = "red", size = 2) +
# Add station labels, but adjuest position to not cover the point.
geom_spatial_label(data = pin, aes(x = lon-0.35, y = lat-0.15, label = label)) +
# Remove axis title
theme(axis.title.x=element_blank(),
axis.title.y=element_blank())
# 2. Draw outer map with box highlighting the inner map.
# Make a dataframe cointatining 4 coordinates defining box corners.
# Use same coordingates as the limits defined above.
insert_box <- data.frame(lon = c(-126.5, -126.5, -121.8, -121.8),
lat = c(48.3, 51.3, 51.3, 48.3))
# Plot outer map
outer <-
# Define the limits (zoom of the map)
basemap(limits = c(-130.0, -80.0, 35.0, 75.0), rotate = TRUE) +
# Draw polygon for inner map
geom_spatial_polygon(data = insert_box, aes(x = lon, y = lat),
fill = NA, color = "red") +
# Remove axis title
theme(axis.title.x=element_blank(),
axis.title.y=element_blank())
## `geom_polypath()` is deprecated: use `ggplot2::geom_polygon()` with the `subgroup` aesthetic
# 3. Make outer plot as insert of inner. Define size and location.
final_map <- inner + inset_element(outer, 0.5, 0.5, 1, 1)
ggsave("Figure_output/Fig.1B_StationMap.pdf", final_map, width = 4, height = 4)
## Assuming `crs = 4326` in stat_spatial_identity()
## Assuming `crs = 4326` in stat_spatial_identity()
## Assuming `crs = 4326` in stat_spatial_identity()
final_map
## Assuming `crs = 4326` in stat_spatial_identity()
## Assuming `crs = 4326` in stat_spatial_identity()
## Assuming `crs = 4326` in stat_spatial_identity()
rm(inner, outer, insert_box, pin)
Reading in all COI ASVs with taxonomic annotation and sample metadata.
# Sample metadata
metadata_COI <- read_csv("Data/Sequence_Metadata_COI_QU39_2017.csv")
## Rows: 257 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): library_ID, lat_lon, source_material_id, title, sra_accession, bi...
## dbl (2): depth, samp_store_temp
## date (1): collection_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# ASV and taxonomy table
ASV_COI <- read_csv("Data/ASV_COI_MZGdb_QU39_2017.csv")
## Rows: 213 Columns: 270
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (12): Kingdom, Phylum, Subphylum, Superclass, Subsuperclass, Class, Inf...
## dbl (258): Annotation_level, QMIC1955_COI_QU39-2017, QMIC1956_COI_QU39-2017,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Turn ASV table into "long" and merge with metadata.
All_COI_data <- ASV_COI %>%
pivot_longer(cols = !1:13, names_to = "library_ID", values_to = "Abundance") %>%
full_join(metadata_COI)
## Joining with `by = join_by(library_ID)`
rm(metadata_COI, ASV_COI)
Reading in all 18S ASVs with taxonomic annotation and sample metadata.
# Sample metadata
metadata_18S <- read_csv("Data/Sequence_Metadata_18S_QU39_2017.csv")
## Rows: 192 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): library_ID, lat_lon, source_material_id, title, sra_accession, bi...
## dbl (2): depth, samp_store_temp
## date (1): collection_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# ASV and taxonomy table
ASV_18S <- read_csv("Data/ASV_18S_MZGdb_QU39_2017.csv")
## Rows: 450 Columns: 205
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (12): Kingdom, Phylum, Subphylum, Superclass, Subsuperclass, Class, Inf...
## dbl (193): Annotation_level, QMIC1955_18S_QU39_3, QMIC1956_18S_QU39_3, QMIC1...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Turn ASV table into "long" and merge with metadata.
All_18S_data <- ASV_18S %>%
pivot_longer(cols = !1:13, names_to = "library_ID", values_to = "Abundance") %>%
full_join(metadata_18S)
## Joining with `by = join_by(library_ID)`
rm(metadata_18S, ASV_18S)
For DNA, three different sample types were collected: Filtered Water, Bulk Zooplankton nets, and filtered ethanol preservative from the zooplankton nets. Here we construct a new Type variable that is based on the “collection_method” from the sample metadata.
# 1. Get levels for collection method:
#All_COI_data %>%
# pull(collection_method) %>% unique()
#. 2. Rename the three levels with more useable names:
Filtered_Water <- "Sterivex Seawater filtration: https://dx.doi.org/10.17504/protocols.io.rm7vzjxrrlx1/v1"
Bulk_Zooplankton <- "Vertical zooplankton net tow, preserved in 99 percent ethanol"
Zooplankton_Preservative <- "Vertical zooplankton net tow, preserved in 99 percent ethanol. Etnahol subsampling: https://dx.doi.org/10.17504/protocols.io.j8nlk888wl5r/v1"
# Add new Type variable to COI metadata.
All_COI_data <- All_COI_data %>%
mutate(Type = ifelse(
collection_method == Filtered_Water,
"Filtered_Water",
ifelse(
collection_method == Bulk_Zooplankton,
"Bulk_Zooplankton",
"Zooplankton_Preservative")
))
# Add new Type variable to 18S metadata.
All_18S_data <- All_18S_data %>%
mutate(Type = ifelse(
collection_method == Filtered_Water,
"Filtered_Water",
ifelse(
collection_method == Bulk_Zooplankton,
"Bulk_Zooplankton",
"Zooplankton_Preservative")
))
Reading in data from taxonomic zooplankton analysis.
df_count_2017 <- read_csv("Data/Zooplankton_QU39_2017.csv")
## Rows: 11904 Columns: 32
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): Key, region_name, Station, Twilight, Net_Type, NOTES, Sample_ID, ...
## dbl (11): lon, lat, Mesh_Size(um), Net_Mouth_Dia(m), DEPTH_STRT, DEPTH_END,...
## lgl (1): CTD
## date (1): Date
## time (1): STN_TIME
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Libraries with poor sequencing depth are removed. Our threshold is set to 10,000 reads per sequencing library. Further, in the MetaZooGene database, the genus Corycaeus was sometimes, but not always, labeled as Ditrichocorycaeus. Ditrichocorycaeus is a potential subgenus, but is not yet commonly accepted. We correct that here.
For COI:
# 1. Filter out libraries under threshold line of 10,000 reads per library
Good_COI_data <- All_COI_data %>%
group_by(library_ID) %>%
filter(sum(Abundance)>10000) %>%
ungroup()
# 2.Summarize statistics of sequencing depth.
print("Median COI library size:")
## [1] "Median COI library size:"
Good_COI_data %>%
group_by(library_ID) %>%
summarise(SeqDepth = sum(Abundance)) %>%
pull(SeqDepth) %>% median
## [1] 39642
print("Number of final COI libraries:")
## [1] "Number of final COI libraries:"
Good_COI_data %>%
pull(library_ID) %>% unique %>% length
## [1] 257
#### NOTE: Taxonomic correction of the subgenus Ditrichocorycaeus, not yet accepted.
Good_COI_data <- Good_COI_data %>%
mutate(Genus = ifelse(Genus == "Ditrichocorycaeus", "Corycaeus", Genus),
Species = ifelse(Species == "Ditrichocorycaeus anglicus","Corycaeus anglicus", Species))
Same for 18S:
# Filter out libraries with low read counts.
Good_18S_data <- All_18S_data %>%
group_by(library_ID) %>%
filter(sum(Abundance)>10000) %>%
ungroup()
print("Median 18S library size:")
## [1] "Median 18S library size:"
Good_18S_data %>%
group_by(library_ID) %>%
summarise(SeqDepth = sum(Abundance)) %>%
pull(SeqDepth) %>% median
## [1] 54078.5
print("Number of final 18S libraries:")
## [1] "Number of final 18S libraries:"
Good_18S_data %>%
pull(library_ID) %>% unique %>% length
## [1] 188
#### NOTE: Taxonomic correction of the subgenus Ditrichocorycaeus, not yet accepted.
Good_18S_data <- Good_18S_data %>%
mutate(Genus = ifelse(Genus == "Ditrichocorycaeus", "Corycaeus", Genus),
Species = ifelse(Species == "Ditrichocorycaeus anglicus","Corycaeus anglicus", Species))
Summarize tables of sample frequency and sequencing depth. Creating table for zooplankton samples:
# 18S Zooplankton bulk data:
tmp_18S <- Good_18S_data %>%
group_by(library_ID, source_material_id, collection_date, Type, depth) %>%
summarise(Abundance = sum(Abundance)) %>%
filter(Type == "Bulk_Zooplankton") %>%
ungroup() %>%
select(collection_date, Abundance) %>%
mutate(Marker = "18S_Sequences")
## `summarise()` has grouped output by 'library_ID', 'source_material_id',
## 'collection_date', 'Type'. You can override using the `.groups` argument.
# COI Zooplankton bulk data:
tmp_COI <- Good_COI_data %>%
group_by(library_ID, source_material_id, collection_date, Type, depth) %>%
summarise(Abundance = sum(Abundance)) %>%
filter(Type == "Bulk_Zooplankton") %>%
ungroup() %>%
select(collection_date, Abundance) %>%
mutate(Marker = "COI_Sequences")
## `summarise()` has grouped output by 'library_ID', 'source_material_id',
## 'collection_date', 'Type'. You can override using the `.groups` argument.
# COI Zooplankton preservative
tmp_COI_EtOH <- Good_COI_data %>%
group_by(library_ID, source_material_id, collection_date, Type, depth) %>%
summarise(Abundance = sum(Abundance)) %>%
filter(Type == "Zooplankton_Preservative") %>%
ungroup() %>%
select(collection_date, Abundance) %>%
mutate(Marker = "COI_Sequences_(Sup)")
## `summarise()` has grouped output by 'library_ID', 'source_material_id',
## 'collection_date', 'Type'. You can override using the `.groups` argument.
# Microscopy Zooplankton data
tmp_mic <- df_count_2017 %>%
group_by(Sample_ID, Station, Date, STN_TIME) %>%
summarise(Biomass = sum(Biomass, na.rm = TRUE)) %>%
ungroup() %>%
select(collection_date = Date, Abundance = Biomass) %>%
mutate(Marker = "Microscopy_Biomass")
## `summarise()` has grouped output by 'Sample_ID', 'Station', 'Date'. You can
## override using the `.groups` argument.
# Combine and turn into table.
bind_rows(tmp_mic, tmp_18S, tmp_COI, tmp_COI_EtOH) %>%
pivot_wider(names_from = Marker, values_from = Abundance) %>%
arrange(collection_date) %>%
write_csv(file = "Data_output/Sup.Tab.S1_ZooplanktonSampleSummary.csv")
rm(tmp_mic, tmp_18S, tmp_COI, tmp_COI_EtOH)
Creating table for water samples:
# 18S Water sample sequencing depth
tmp_18S <- Good_18S_data %>%
group_by(library_ID, source_material_id, collection_date, Type, depth) %>%
summarise(Abundance = sum(Abundance)) %>%
filter(Type == "Filtered_Water") %>%
ungroup() %>%
select(collection_date, Abundance, depth) %>%
mutate(Marker = "18S_Sequences")
## `summarise()` has grouped output by 'library_ID', 'source_material_id',
## 'collection_date', 'Type'. You can override using the `.groups` argument.
# COI water sample sequencing depth
tmp_COI <- Good_COI_data %>%
group_by(library_ID, source_material_id, collection_date, Type, depth) %>%
summarise(Abundance = sum(Abundance)) %>%
filter(Type == "Filtered_Water") %>%
ungroup() %>%
select(collection_date, Abundance, depth) %>%
mutate(Marker = "COI_Sequences")
## `summarise()` has grouped output by 'library_ID', 'source_material_id',
## 'collection_date', 'Type'. You can override using the `.groups` argument.
# Combine to table
bind_rows(tmp_18S, tmp_COI) %>%
group_by(collection_date, depth, Marker) %>%
summarise(Abundance = sum(Abundance, na.rm = TRUE)) %>%
pivot_wider(names_from = Marker, values_from = Abundance) %>%
arrange(collection_date, as.numeric(depth)) %>%
mutate(`Sequences COI/18S` = paste(COI_Sequences, `18S_Sequences`, sep = " / ")) %>%
select(collection_date, depth, `Sequences COI/18S`) %>%
pivot_wider(names_from = depth, values_from = `Sequences COI/18S`) %>%
write_csv2(file = "Data_output/Suppl.Tab.S2_WaterSampleSummary.csv")
## `summarise()` has grouped output by 'collection_date', 'depth'. You can
## override using the `.groups` argument.
rm(tmp_mic, tmp_18S, tmp_COI, tmp_COI_EtOH)
## Warning in rm(tmp_mic, tmp_18S, tmp_COI, tmp_COI_EtOH): object 'tmp_mic' not
## found
## Warning in rm(tmp_mic, tmp_18S, tmp_COI, tmp_COI_EtOH): object 'tmp_COI_EtOH'
## not found
This first plot shows the relative abundance of Metazooans, compared to other organisms for 18S and COI respectively.
# Calculate % metazoans for all samples in 18S data
summary_18S <- Good_18S_data %>%
mutate(depth = ifelse(Type == "Bulk_Zooplankton", "230-0", depth)) %>%
# Summarise abundance per kingdom
group_by(collection_date, Kingdom, library_ID, depth) %>%
summarise(Abundance = sum(Abundance)) %>%
# Calculate relative abuncance per sample
group_by(collection_date, depth, library_ID) %>%
reframe(RA = Abundance/sum(Abundance), Kingdom, Abundance) %>%
mutate(Marker = "18S")
## `summarise()` has grouped output by 'collection_date', 'Kingdom', 'library_ID'.
## You can override using the `.groups` argument.
# Calculate % metazoans for all samples in COI data
summary_COI <- Good_COI_data %>%
mutate(depth = ifelse(Type == "Bulk_Zooplankton", "230-0", depth)) %>%
# Summarise abundance per kingdom
group_by(collection_date, depth, Kingdom, library_ID) %>%
summarise(Abundance = sum(Abundance)) %>%
# Calculate relative abuncance per sample
group_by(collection_date, depth, library_ID) %>%
reframe(RA = Abundance/sum(Abundance), Kingdom, Abundance) %>%
mutate(Marker = "COI")
## `summarise()` has grouped output by 'collection_date', 'depth', 'Kingdom'. You
## can override using the `.groups` argument.
# Merge the two datasets
bind_rows(summary_18S, summary_COI) %>%
# Show only proportion of animals
filter(Kingdom == "Animalia") %>%
# Arrange sampling depths
mutate(depth = ifelse(depth == "230", "260", depth)) %>%
mutate(depth = factor(depth,
levels= c("0", "5", "30",
"100", "260", "230-0"))) %>%
# Plot boxplot
ggplot() +
geom_boxplot(aes(depth, RA, color = Marker)) +
theme_cowplot(12) +
theme(aspect.ratio = 1,
strip.background =element_blank()) +
scale_color_manual(values = c('#377eb8','#4daf4a'))
ggsave("Figure_output/Sup.Fig.S1A_AnnotationSuccess.pdf", width = 4, height = 4)
rm(summary_18S, summary_COI)
In this section the different data formats of the 2017 QU39 time series will be mutated into similar data shapes and transformed in such a way that the data types can be compared. To keep filtering parameters equal for all data types, a set of functions are first defined, that are then executed for all data sets. Comparisons are done both at species and at genus level.
Key for data transformation is the index_RA function that is defined in ./Code/Functions.R. This function calculates eDNA-index and relative abundance for all taxa and samples. Read more details in the function description.
Microscopy count data from integrated zooplankton net tows:
# Species level
Net_count_2017_index_species <- df_count_2017 %>%
mutate(Depth = DEPTH_STRT,
Taxa = paste(Genus, Tax1, sep = " "),
Abundance = Biomass,
Sample = Key) %>%
index_RA(Sample, Taxa, Abundance, Depth, Date)
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
# Genus level
Net_count_2017_index_genus <- df_count_2017 %>%
mutate(Depth = DEPTH_STRT,
Taxa = Genus,
Abundance = Biomass,
Sample = Key) %>%
index_RA(Sample, Taxa, Abundance, Depth, Date)
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
The DNA datasets are transformed identical to the microscopy data sets. Due to their similarity, a generalized function is first created, then executed for the data sets and taxonomic levels individually.
# Define wrapper function
calculate_index <- function(data, taxrank, type) {
data %>%
# Filter Sample type
filter(Type == type,
# Only animals and annotated taxa are included
Kingdom == "Animalia",
str_detect({{ taxrank }}, "@", negate = TRUE)) %>%
mutate(Taxa = {{ taxrank }},
Depth = depth,
Date = collection_date,
Sample = library_ID) %>%
# Execute the transformation
index_RA(Sample, Taxa, Abundance, Depth, Date)
}
# Execute function for all subsets:
Net_18S_2017_index_species <- calculate_index(Good_18S_data, Species, "Bulk_Zooplankton")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
Net_18S_2017_index_genus <- calculate_index(Good_18S_data, Genus, "Bulk_Zooplankton")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
eDNA_18S_2017_index_species <- calculate_index(Good_18S_data, Species, "Filtered_Water")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
eDNA_18S_2017_index_genus <- calculate_index(Good_18S_data, Genus, "Filtered_Water")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
Net_COI_2017_index_species <- calculate_index(Good_COI_data, Species, "Bulk_Zooplankton")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
Net_COI_2017_index_genus <- calculate_index(Good_COI_data, Genus, "Bulk_Zooplankton")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
eDNA_COI_2017_index_species <- calculate_index(Good_COI_data, Species, "Filtered_Water")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
eDNA_COI_2017_index_genus <- calculate_index(Good_COI_data, Genus, "Filtered_Water")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
EtOH_COI_2017_index_species <- calculate_index(Good_COI_data, Species, "Zooplankton_Preservative")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
EtOH_COI_2017_index_genus <- calculate_index(Good_COI_data, Genus, "Zooplankton_Preservative")
## `summarise()` has grouped output by 'Sample', 'Taxa', 'Depth'. You can override
## using the `.groups` argument.
Defining a function for filtering singletons.
ListNoSing <- function(data, type = "DNA") {
if (type == "DNA") {data <- filter(data, Abundance > 1)}
data %>%
group_by(Taxa) %>%
summarise(n=n()) %>%
filter(n>1) %>%
pull(Taxa)
}
We compared the range of taxa that were captured with the Marine-Invertebrate COI primer vs the universal Eukaryote 18S primer, compared with microscopic zooplankton analysis.
Comparing zooplankton net samples, 18S vs COI vs microscopy.
# Genus level comp.
GenusListComp_1 <- list(
`Net Microscopy` = ListNoSing(Net_count_2017_index_genus, "m"),
`Net 18S` = ListNoSing(Net_18S_2017_index_genus),
`Net COI` = ListNoSing(Net_COI_2017_index_genus)
)
pdf("Figure_output/Fig.2A_EulerGenusNet.pdf", width = 3.5, height = 3.5)
set.seed(102)
GenusListComp_1 %>%
fromList() %>%
euler(shape = "circle") %>%
plot(quantities = TRUE,
lty = 1:5)
dev.off()
## quartz_off_screen
## 2
# Species level comp.
SpeciesListComp_1 <- list(
`Net Microscopy` = ListNoSing(Net_count_2017_index_species, "m"),
`Net 18S` = ListNoSing(Net_18S_2017_index_species),
`Net COI` = ListNoSing(Net_COI_2017_index_species)
)
pdf("Figure_output/Fig.2B_EulerSpeciesNet.pdf", width = 3.5, height = 3.5)
SpeciesListComp_1 %>%
fromList() %>%
euler(shape = "circle") %>%
plot(quantities = TRUE,
lty = 1:5)
dev.off()
## quartz_off_screen
## 2
Comparing zooplankton net samples, with water samples.
# 18S and microscopy
ListComp_18S <- list(
`Net Microscopy` = ListNoSing(Net_count_2017_index_genus, type = "m"),
`Net 18S` = ListNoSing(Net_18S_2017_index_genus),
`Water 18S` = ListNoSing(eDNA_18S_2017_index_genus)
)
pdf("Figure_output/Fig.S2A_EulerCOI.pdf", width = 3.5, height = 3.5)
set.seed(102)
ListComp_18S %>%
fromList() %>%
euler(shape = "circle") %>%
plot(quantities = TRUE,
lty = 1:5)
dev.off()
## quartz_off_screen
## 2
# COI and microscopy
ListComp_COI <- list(
`Net Microscopy` = ListNoSing(Net_count_2017_index_genus, type = "m"),
`Net COI` = ListNoSing(Net_COI_2017_index_genus),
`Water COI` = ListNoSing(eDNA_COI_2017_index_genus)
)
pdf("Figure_output/Fig.S2B_Euler18S.pdf", width = 3.5, height = 3.5)
ListComp_COI %>%
fromList() %>%
euler(shape = "circle") %>%
plot(quantities = TRUE,
lty = 1:5)
dev.off()
## quartz_off_screen
## 2
rm(GenusListComp_1, SpeciesListComp_1, ListComp_18S, ListComp_COI)
At species level, COI appears to Identify conciderably more of the counted zooplankton taxa than 18S.
At genus level all methods align better. COI preforms better than 18S, and Net samples preform better than eDNA.
From the Euler diagram it appears as only a fraction of counted zooplankton is detected by the DNA methods. However, the diagram values all taxa equally and does not account for the relative importance of the taxa. The following code extracts the biomass proportion of counted taxa that are also detected by COI and 18S respectively.
{ ### Preparing the data at Species Level:
Sum_per_species <- Net_count_2017_index_species %>%
filter(Taxa %in% ListNoSing(Net_count_2017_index_species, type = "m")) %>%
group_by(Taxa, Date) %>%
summarize(Biomass = sum(Abundance))
Sum_per_day <- Sum_per_species %>%
group_by(Date) %>%
summarize(Counted_Biomass = sum(Biomass))
Detected_COI <- Sum_per_species %>%
filter(Taxa %in% ListNoSing(Net_COI_2017_index_species)) %>%
group_by(Date) %>%
summarize(Detected_COI = sum(Biomass)) %>%
left_join(Sum_per_day, by = "Date")
Detected_18S <- Sum_per_species %>%
filter(Taxa %in% ListNoSing(Net_18S_2017_index_species)) %>%
group_by(Date) %>%
summarize(Detected_18S = sum(Biomass)) %>%
left_join(Detected_COI, by = "Date")
Sum_per_day_all <- Sum_per_species %>%
filter(Taxa %in% c(ListNoSing(Net_18S_2017_index_species),
ListNoSing(Net_COI_2017_index_species))) %>%
group_by(Date) %>%
summarize(Detected_18S_COI = sum(Biomass)) %>%
left_join(Detected_18S, by = "Date")
#Plot yearly average
fraction_species <-
Sum_per_day_all %>%
mutate(`Bulk COI` = Detected_COI / Counted_Biomass,
`Bulk 18S` = Detected_18S / Counted_Biomass,
`Bulk COI & 18S` = Detected_18S_COI /Counted_Biomass) %>%
select(Date, `Bulk COI`, `Bulk 18S`, `Bulk COI & 18S`) %>%
pivot_longer(2:4, names_to = "Marker", values_to = "Proportion") %>%
mutate(Level = "Species level")
}
## `summarise()` has grouped output by 'Taxa'. You can override using the
## `.groups` argument.
{ #Preparing the data at Genus Level:
Sum_per_genus <- Net_count_2017_index_genus %>%
filter(Taxa %in% ListNoSing(Net_count_2017_index_genus, type = "m")) %>%
group_by(Taxa, Date) %>%
summarize(Biomass = sum(Abundance))
Sum_per_day <- Sum_per_genus %>%
group_by(Date) %>%
summarize(Counted_Biomass = sum(Biomass))
Detected_COI <- Sum_per_genus %>%
filter(Taxa %in% ListNoSing(Net_COI_2017_index_genus)) %>%
group_by(Date) %>%
summarize(Detected_COI = sum(Biomass)) %>%
left_join(Sum_per_day, by = "Date")
Detected_18S <- Sum_per_genus %>%
filter(Taxa %in% ListNoSing(Net_18S_2017_index_genus)) %>%
group_by(Date) %>%
summarize(Detected_18S = sum(Biomass)) %>%
left_join(Detected_COI, by = "Date")
Sum_per_day_all <- Sum_per_genus %>%
filter(Taxa %in% c(ListNoSing(Net_18S_2017_index_genus),
ListNoSing(Net_COI_2017_index_genus))) %>%
group_by(Date) %>%
summarize(Detected_18S_COI = sum(Biomass)) %>%
left_join(Detected_18S, by = "Date")
fraction_genus <-
Sum_per_day_all %>%
mutate(`Bulk COI` = Detected_COI / Counted_Biomass,
`Bulk 18S` = Detected_18S / Counted_Biomass,
`Bulk COI & 18S` = Detected_18S_COI /Counted_Biomass) %>%
select(Date, `Bulk COI`, `Bulk 18S`, `Bulk COI & 18S`) %>%
pivot_longer(2:4, names_to = "Marker", values_to = "Proportion") %>%
mutate(Level = "Genus level")
}
## `summarise()` has grouped output by 'Taxa'. You can override using the
## `.groups` argument.
# Combine and plot
Biomass_fraction_plot <- bind_rows(fraction_genus, fraction_species) %>%
#mutate(Level = factor(Level, levels = c("Species level", "Genus level"))) %>%
ggplot(aes(Marker, Proportion)) +
geom_violin(aes(color = Marker)) +
cowplot::theme_minimal_hgrid(12) +
facet_wrap("Level") +
theme(aspect.ratio = 1,
axis.title.x=element_blank())
Biomass_fraction_plot
ggsave("Figure_output/Fig.2CD_BiomassFraction.pdf", width = 7, height = 4.5)
rm(Biomass_fraction_plot, Detected_18S, Detected_COI, fraction_genus,
fraction_species, Sum_per_day, Sum_per_day_all,
Sum_per_species, Sum_per_genus)
18S (Balzano) Ccannot be used to identify taxa at species level.
COI Covers around 38-75% of the ZP-net biomass at the specie level.
All markers have higher agreement on the genus level, but COI still outperforms 18S also at the genus level.
Combining 18S and COI is slightly, but only marginally better than using COI alone.
So the DNA methods combined targets 50-70% of zooplankton my biomass. Here we investigate the remaining genera, that remains un-targeted by the DNA methods. The Genera are ordered by relative biomass contribution.
# Detected by BOTH microscopy and ny of the other methods
microscopy_and_DNA <- Net_count_2017_index_genus %>%
filter(Taxa %in% ListNoSing(., "m"),
Taxa %in% c(ListNoSing(Net_18S_2017_index_genus),
ListNoSing(Net_COI_2017_index_genus))) %>%
group_by(Taxa) %>%
summarise(Abundance = sum(Abundance)) %>%
mutate(RA = Abundance/sum(Abundance)) %>%
filter(RA > 0.000) %>%
mutate(Type = "Microscopy and DNA")
# Only detected by microscopy
only_microscopy <- Net_count_2017_index_genus %>%
filter(Taxa %in% ListNoSing(Net_count_2017_index_genus, type = "m"),
Taxa %!in% ListNoSing(Net_18S_2017_index_genus),
Taxa %!in% ListNoSing(Net_COI_2017_index_genus)) %>%
group_by(Taxa, Date) %>%
summarize(RA = mean(RA)) %>%
group_by(Taxa) %>%
summarize(RA = mean(RA)) %>%
filter(RA > 0.000) %>%
mutate(Type = "Only by Microscopy")
## `summarise()` has grouped output by 'Taxa'. You can override using the
## `.groups` argument.
# Only detected by COI
only_COI <- Net_COI_2017_index_genus %>%
filter(Taxa %in% ListNoSing(Net_COI_2017_index_genus)) %>%
filter(Taxa %!in% ListNoSing(Net_count_2017_index_genus, type = "m")) %>%
group_by(Taxa) %>%
summarize(RA = mean(RA)) %>%
mutate(Type = "Only by COI")
only_COI
## # A tibble: 19 × 3
## Taxa RA Type
## <chr> <dbl> <chr>
## 1 Aurelia 0.00000891 Only by COI
## 2 Balanus 0.00184 Only by COI
## 3 Cancer 0.00131 Only by COI
## 4 Crepipatella 0.00000105 Only by COI
## 5 Eualus 0.000217 Only by COI
## 6 Euphysa 0.00000948 Only by COI
## 7 Eusergestes 0.000133 Only by COI
## 8 Harmothoe 0.000176 Only by COI
## 9 Hermissenda 0.0000182 Only by COI
## 10 Membranipora 0.000101 Only by COI
## 11 Mesochaetopterus 0.000335 Only by COI
## 12 Metacarcinus 0.000104 Only by COI
## 13 Nematoscelis 0.00000501 Only by COI
## 14 Ophiopholis 0.0000405 Only by COI
## 15 Petrasma 0.00000967 Only by COI
## 16 Placida 0.00170 Only by COI
## 17 Plotocnide 0.00000217 Only by COI
## 18 Spirontocaris 0.000448 Only by COI
## 19 Strongylocentrotus 0.0000415 Only by COI
# Only detected by 18S
only_18S <- Net_18S_2017_index_genus %>%
filter(Taxa %in% ListNoSing(Net_18S_2017_index_genus),
Taxa %!in% ListNoSing(Net_count_2017_index_genus, type = "m")) %>%
group_by(Taxa) %>%
summarize(RA = mean(RA)) %>%
mutate(Type = "Only by 18S")
#Combine and plot
bind_rows(microscopy_and_DNA, only_microscopy, only_COI, only_18S) %>%
mutate(Type = factor(Type, levels = c(
"Microscopy and DNA", "Only by Microscopy", "Only by COI", "Only by 18S"
))) %>%
filter(RA > 0.001) %>% ##### NOTE: REMOVED LOW COUNTS! #######
ggplot() +
geom_bar(aes(reorder(Taxa, -RA), RA), stat = "identity") +
theme_minimal_hgrid(9) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1,
face = "italic")) +
facet_wrap("Type", scales = "free_x") +
ylab("Relative abundance") +
xlab("Genus")
ggsave("Figure_output/Fig.3_RemainingTaxa.pdf")
## Saving 7 x 5 in image
rm(only_18S, only_COI, only_microscopy, microscopy_and_DNA)
18S Cannot be used to identify taxa at species level, the gene is to conserved.
COI Covers around 38-75% of the ZP-net biomass at the specie level.
All markers have higher agreement on the genus level, but COI still outperforms 18S also at the genus level.
Combining 18S and COI is slightly, but only marginally better than using COI alone.
Out of the biomass not detected by the molecular markers, larger taxa, such as amphipods, has the highest biomass.
eDNA of COI and 18S catches many different metazooan taxa not targeted by microscopy. Most of these taxa are benthic mollusks, worms, echinoderms or cnidarians.
To allow for plotting a subset of taxa, we define
# Vector for all genera
GenusList <-
Net_count_2017_index_genus %>%
# Select taxa present in both microscopy and DNA NET samples
filter(Taxa %in% ListNoSing(Net_count_2017_index_genus, "m"),
Taxa %in% c(ListNoSing(Net_18S_2017_index_genus),
ListNoSing(Net_COI_2017_index_genus))) %>%
# Pull list
pull(Taxa) %>% unique()
# Vector for the top 27 abundant genera
GenusList27 <-
Net_count_2017_index_genus %>%
# Select taxa present in both microscopy and DNA NET samples
filter(Taxa %in% ListNoSing(Net_count_2017_index_genus, "m"),
Taxa %in% c(ListNoSing(Net_18S_2017_index_genus),
ListNoSing(Net_COI_2017_index_genus))) %>%
# Select the 27 most abundant taxa
group_by(Taxa) %>%
summarize(RA = mean(RA)) %>%
arrange(-RA) %>%
mutate(count = 1:35) %>%
filter(count < 28) %>%
# Pull list
pull(Taxa) %>% unique()
# Vector for the top 15 abundant genera
GenusList15 <-
Net_count_2017_index_genus %>%
# Select taxa present in both microscopy and DNA NET samples
filter(Taxa %in% ListNoSing(Net_count_2017_index_genus, "m"),
Taxa %in% c(ListNoSing(Net_18S_2017_index_genus),
ListNoSing(Net_COI_2017_index_genus))) %>%
# Select the 27 most abundant taxa
group_by(Taxa) %>%
summarize(RA = mean(RA)) %>%
arrange(-RA) %>%
mutate(count = 1:35) %>%
filter(count < 16) %>%
# Pull list
pull(Taxa) %>% unique()
# At Species Level
SpeciesList <-
Net_count_2017_index_species %>%
filter(Taxa %in% ListNoSing(., "m"),
Taxa %in% c(ListNoSing(Net_COI_2017_index_species),
ListNoSing(eDNA_COI_2017_index_species))) %>%
pull(Taxa) %>% unique()
We compare the preformace of normalization methods (Relative abundance RA, Pressence/Absence PA, and eDNA-index RA-index) for the Zooplankton net samples, to compare microscopy, 18S and COI analysis methods. To do this, we first construct a combined dataset:
Net_merged_data <-
# Combine data sets
bind_rows(
mutate(Net_count_2017_index_genus, marker = "Microscopy"),
mutate(Net_COI_2017_index_genus, marker = "COI"),
mutate(Net_18S_2017_index_genus, marker = "18S")) %>%
# Calculate pressence / Absence from the relative abundance
mutate(PA = ifelse(RA>0.001, 1, 0)) %>%
# Remove unwanted taxa and columns
filter(Taxa %in% GenusList) %>%
select(Taxa, Date, Abundance, RA, RA_Index, PA, marker) %>%
# Wrangle the data set to make tidy for comparisons between Molecular/Microscopy
# Pivot transformation method long format.
pivot_longer(3:6, names_to = "Index_Type", values_to = "value") %>%
# Pivot marker (18S, COI, microscopy) wide.
pivot_wider(names_from = marker, values_from = value) %>%
# Make pairwise comparison: 18S-microscopy vs COI-microscopy
pivot_longer(5:6, names_to = "Gene_Marker", values_to = "Molecular") %>%
select(Taxa, Date, Gene_Marker, Index_Type, Microscopy, Molecular) %>%
# Remove samples with no matches Molecular/Mic
filter(is.na(Molecular) == FALSE,
is.na(Microscopy) == FALSE)
To evaluate the effect of analysis and normalization method on the merged datasets, a permutational multivariate analysis of variance (permanova) is conducted with 999 permutations based on the Bray-Curtis distances between samples in the community matrix using adonis from the vegan R package v. 2.6-4 (Oksanen et al., 2007). The results are visualized with 2-dimensional non-metric multidimensional scaling (NMDS) using the metaMDS function from the vegan package (allowing for 100 iterations). Mantel test was used to test for covariance between sample distance matrixes produced with microscopy vs molecular data using the mantel function in vegan with 9999 permutations.
To be able to repeat multivariate statistics and plots, multivariate statistics are defined as a function:
NMDS <- function(data = Net_merged_data, Marker = "COI", Type = "RA") {
# Construct community matrix dataframe
wide <- filter(data, Gene_Marker == Marker, Index_Type == Type) %>%
select(Taxa, Date, Microscopy, Molecular) %>%
pivot_longer(3:4, names_to = "Method", values_to = "Value") %>%
pivot_wider(names_from = Taxa, values_from = Value)
## MANTEL TEST
# Matrix for molecular
molmat <- wide %>%
filter(Method == "Molecular") %>%
mutate(Sample = Date) %>%
select(!1:2) %>%
column_to_rownames("Sample") %>%
as.matrix()
# Matrix for microscopy
micmat <- wide %>%
filter(Method == "Microscopy") %>%
mutate(Sample = Date) %>%
select(!1:2) %>%
column_to_rownames("Sample") %>%
as.matrix()
# Control that sample names are the same
control <- ifelse(row.names(molmat) != row.names(micmat),"WARNING", "OK")
ifelse("WARNING" %in% control, print("WARNING"), print("OK"))
# Execute the mantel test
m.mod <- mantel(vegdist(molmat), vegdist(micmat))
## PERMANOVA and NMDS
# Combined community matrix
mat <- wide %>%
mutate(Sample = paste(Date, Method, sep = "_")) %>%
select(!1:2) %>%
column_to_rownames("Sample") %>%
as.matrix()
# Execute permanova
permanova <- adonis2(mat ~ Method+Date,
data = mutate(wide, Date = format(as.Date(Date),
format="%m")),
permutations = 999, method="bray")
# Execute NMDS
mod <- metaMDS(mat, trymax = 100)
# DATA EXTRACTION
# Extract score coordinates from NMDS
NMDS_scores <- function(mod) {
Samples <-
scores(mod)$sites %>%
as.data.frame() %>%
rownames_to_column("Name") %>%
separate(Name, into = c("Date", "Method"), sep = "_") %>%
mutate(Layer = "Samples")
Taxa <-
scores(mod)$species %>%
as.data.frame() %>%
rownames_to_column("Taxa") %>%
mutate(Layer = "Taxa")
bind_rows(Samples, Taxa)
} # End NMDS Scores
out <- NMDS_scores(mod) %>%
mutate(Gene_Marker = Marker, Index_Type = Type)
# Combine data set of statistical output
stats <- tibble(Marker = Marker,
Type = Type,
NMDS.stress = mod$stress,
Method.R2 = permanova$R2[1],
Method.F = permanova$F[1],
Method.P = permanova$`Pr(>F)`[1],
Date.R2 = permanova$R2[2],
Date.F = permanova$F[2],
Date.P = permanova$`Pr(>F)`[2],
Mantel.R = m.mod$statistic,
Mantel.P = m.mod$signif)
output <- list(NMDS.scores = out,
NMDS.stressplot = stressplot(mod),
Stats = stats)
return(output)
}
mod1 <- NMDS(Marker = "COI", Type = "RA")
## [1] "OK"
## Run 0 stress 0.1729314
## Run 1 stress 0.1729314
## ... New best solution
## ... Procrustes: rmse 3.294049e-05 max resid 0.0001528643
## ... Similar to previous best
## Run 2 stress 0.1797813
## Run 3 stress 0.1830087
## Run 4 stress 0.1728551
## ... New best solution
## ... Procrustes: rmse 0.009756111 max resid 0.04485131
## Run 5 stress 0.1729314
## ... Procrustes: rmse 0.009757584 max resid 0.04486873
## Run 6 stress 0.1830087
## Run 7 stress 0.2133701
## Run 8 stress 0.1797813
## Run 9 stress 0.1798019
## Run 10 stress 0.21773
## Run 11 stress 0.1903895
## Run 12 stress 0.1921537
## Run 13 stress 0.1904928
## Run 14 stress 0.182842
## Run 15 stress 0.1729314
## ... Procrustes: rmse 0.00975637 max resid 0.04486273
## Run 16 stress 0.1830087
## Run 17 stress 0.1728551
## ... New best solution
## ... Procrustes: rmse 6.716636e-05 max resid 0.0003015934
## ... Similar to previous best
## Run 18 stress 0.1728551
## ... New best solution
## ... Procrustes: rmse 2.957596e-06 max resid 1.278831e-05
## ... Similar to previous best
## Run 19 stress 0.182842
## Run 20 stress 0.1729314
## ... Procrustes: rmse 0.009753565 max resid 0.04482468
## *** Best solution repeated 1 times
mod2 <- NMDS(Marker = "COI", Type = "PA")
## [1] "OK"
## Run 0 stress 0.1782295
## Run 1 stress 0.1799811
## Run 2 stress 0.2109472
## Run 3 stress 0.1879819
## Run 4 stress 0.1799811
## Run 5 stress 0.1799811
## Run 6 stress 0.1782877
## ... Procrustes: rmse 0.005405277 max resid 0.02675052
## Run 7 stress 0.1782876
## ... Procrustes: rmse 0.00539689 max resid 0.02673496
## Run 8 stress 0.1869326
## Run 9 stress 0.1799717
## Run 10 stress 0.1799811
## Run 11 stress 0.1799811
## Run 12 stress 0.1876046
## Run 13 stress 0.1942479
## Run 14 stress 0.1782876
## ... Procrustes: rmse 0.005401527 max resid 0.02673718
## Run 15 stress 0.1782296
## ... Procrustes: rmse 2.754245e-05 max resid 8.334306e-05
## ... Similar to previous best
## Run 16 stress 0.2123819
## Run 17 stress 0.1799716
## Run 18 stress 0.1879819
## Run 19 stress 0.1799811
## Run 20 stress 0.1799811
## *** Best solution repeated 1 times
mod3 <- NMDS(Marker = "COI", Type = "RA_Index")
## [1] "OK"
## Run 0 stress 0.229946
## Run 1 stress 0.2589802
## Run 2 stress 0.2299827
## ... Procrustes: rmse 0.003568419 max resid 0.01674597
## Run 3 stress 0.2332293
## Run 4 stress 0.2330485
## Run 5 stress 0.2363588
## Run 6 stress 0.2299247
## ... New best solution
## ... Procrustes: rmse 0.01241305 max resid 0.06345527
## Run 7 stress 0.2794731
## Run 8 stress 0.2337812
## Run 9 stress 0.2327175
## Run 10 stress 0.2752637
## Run 11 stress 0.229851
## ... New best solution
## ... Procrustes: rmse 0.004232046 max resid 0.02041925
## Run 12 stress 0.2548352
## Run 13 stress 0.2363588
## Run 14 stress 0.2299245
## ... Procrustes: rmse 0.004024504 max resid 0.01965912
## Run 15 stress 0.2364958
## Run 16 stress 0.2298242
## ... New best solution
## ... Procrustes: rmse 0.003470575 max resid 0.01634122
## Run 17 stress 0.2299827
## ... Procrustes: rmse 0.01167224 max resid 0.06471058
## Run 18 stress 0.2363643
## Run 19 stress 0.2363643
## Run 20 stress 0.2299247
## ... Procrustes: rmse 0.005560408 max resid 0.02046456
## Run 21 stress 0.2546502
## Run 22 stress 0.2364836
## Run 23 stress 0.2298242
## ... Procrustes: rmse 4.711618e-06 max resid 1.24609e-05
## ... Similar to previous best
## *** Best solution repeated 1 times
mod4 <- NMDS(Marker = "18S", Type = "RA")
## [1] "OK"
## Run 0 stress 0.1515485
## Run 1 stress 0.2039911
## Run 2 stress 0.1515485
## ... Procrustes: rmse 6.076688e-06 max resid 3.176846e-05
## ... Similar to previous best
## Run 3 stress 0.1515485
## ... New best solution
## ... Procrustes: rmse 5.453497e-06 max resid 2.94784e-05
## ... Similar to previous best
## Run 4 stress 0.1898251
## Run 5 stress 0.1515485
## ... Procrustes: rmse 3.977544e-06 max resid 2.128542e-05
## ... Similar to previous best
## Run 6 stress 0.1515485
## ... Procrustes: rmse 5.772845e-06 max resid 3.16019e-05
## ... Similar to previous best
## Run 7 stress 0.1515485
## ... Procrustes: rmse 1.623876e-05 max resid 9.102008e-05
## ... Similar to previous best
## Run 8 stress 0.1515485
## ... Procrustes: rmse 4.169585e-06 max resid 2.043555e-05
## ... Similar to previous best
## Run 9 stress 0.1515485
## ... Procrustes: rmse 7.08425e-06 max resid 3.92784e-05
## ... Similar to previous best
## Run 10 stress 0.1897312
## Run 11 stress 0.1515485
## ... Procrustes: rmse 5.289805e-06 max resid 2.350063e-05
## ... Similar to previous best
## Run 12 stress 0.1515485
## ... Procrustes: rmse 4.000149e-06 max resid 1.34316e-05
## ... Similar to previous best
## Run 13 stress 0.1515485
## ... Procrustes: rmse 7.698472e-06 max resid 4.324141e-05
## ... Similar to previous best
## Run 14 stress 0.1515485
## ... New best solution
## ... Procrustes: rmse 1.455532e-06 max resid 3.81463e-06
## ... Similar to previous best
## Run 15 stress 0.1515485
## ... Procrustes: rmse 2.587979e-06 max resid 1.181256e-05
## ... Similar to previous best
## Run 16 stress 0.1515485
## ... Procrustes: rmse 4.030675e-06 max resid 2.244695e-05
## ... Similar to previous best
## Run 17 stress 0.1515485
## ... Procrustes: rmse 2.780907e-06 max resid 1.433897e-05
## ... Similar to previous best
## Run 18 stress 0.1515485
## ... Procrustes: rmse 1.112566e-05 max resid 6.340486e-05
## ... Similar to previous best
## Run 19 stress 0.1515485
## ... Procrustes: rmse 4.903873e-06 max resid 2.484589e-05
## ... Similar to previous best
## Run 20 stress 0.1515485
## ... Procrustes: rmse 4.636724e-06 max resid 2.589168e-05
## ... Similar to previous best
## *** Best solution repeated 7 times
mod5 <- NMDS(Marker = "18S", Type = "PA")
## [1] "OK"
## Run 0 stress 0.1648822
## Run 1 stress 0.202059
## Run 2 stress 0.1768417
## Run 3 stress 0.237987
## Run 4 stress 0.180124
## Run 5 stress 0.2079033
## Run 6 stress 0.180124
## Run 7 stress 0.1775264
## Run 8 stress 0.1648822
## ... New best solution
## ... Procrustes: rmse 2.765246e-06 max resid 9.167204e-06
## ... Similar to previous best
## Run 9 stress 0.2194206
## Run 10 stress 0.1648822
## ... New best solution
## ... Procrustes: rmse 5.070433e-06 max resid 2.010249e-05
## ... Similar to previous best
## Run 11 stress 0.1768417
## Run 12 stress 0.2093228
## Run 13 stress 0.1775264
## Run 14 stress 0.2065656
## Run 15 stress 0.1648822
## ... Procrustes: rmse 6.572714e-06 max resid 2.828898e-05
## ... Similar to previous best
## Run 16 stress 0.2025463
## Run 17 stress 0.199336
## Run 18 stress 0.2062758
## Run 19 stress 0.180124
## Run 20 stress 0.1775264
## *** Best solution repeated 2 times
mod6 <- NMDS(Marker = "18S", Type = "RA_Index")
## [1] "OK"
## Run 0 stress 0.22905
## Run 1 stress 0.2291326
## ... Procrustes: rmse 0.004810171 max resid 0.02195877
## Run 2 stress 0.22905
## ... New best solution
## ... Procrustes: rmse 4.407711e-05 max resid 0.0001634632
## ... Similar to previous best
## Run 3 stress 0.22905
## ... Procrustes: rmse 2.091524e-05 max resid 9.052574e-05
## ... Similar to previous best
## Run 4 stress 0.22905
## ... Procrustes: rmse 1.428698e-05 max resid 4.726789e-05
## ... Similar to previous best
## Run 5 stress 0.22905
## ... Procrustes: rmse 1.164494e-05 max resid 4.209806e-05
## ... Similar to previous best
## Run 6 stress 0.22905
## ... Procrustes: rmse 1.654146e-05 max resid 6.480173e-05
## ... Similar to previous best
## Run 7 stress 0.2577171
## Run 8 stress 0.22905
## ... Procrustes: rmse 8.385198e-06 max resid 3.63588e-05
## ... Similar to previous best
## Run 9 stress 0.22905
## ... Procrustes: rmse 2.442364e-05 max resid 8.089969e-05
## ... Similar to previous best
## Run 10 stress 0.2451632
## Run 11 stress 0.2386129
## Run 12 stress 0.256818
## Run 13 stress 0.238613
## Run 14 stress 0.2291325
## ... Procrustes: rmse 0.004812485 max resid 0.02194858
## Run 15 stress 0.22905
## ... Procrustes: rmse 2.324002e-05 max resid 8.621688e-05
## ... Similar to previous best
## Run 16 stress 0.2458877
## Run 17 stress 0.2419902
## Run 18 stress 0.22905
## ... Procrustes: rmse 2.967634e-05 max resid 0.0001038422
## ... Similar to previous best
## Run 19 stress 0.22905
## ... Procrustes: rmse 2.530951e-05 max resid 0.0001150715
## ... Similar to previous best
## Run 20 stress 0.2378508
## *** Best solution repeated 10 times
bind_rows(
mod1$Stats, mod2$Stats, mod3$Stats, mod4$Stats, mod5$Stats, mod6$Stats) %>%
write_csv(file = "./Figure_output/Tab.1_MultivariateStats.csv")
We use the combined output from abuve to plot NMDS:
# Define a seasonal color scheme:
seasonal_colors = c("#053061", "#3288BD", "#66C2A5", "#7FBC41",
"#A6D96A", "#FEE08B", "#FDAE61", "#F46D43",
"#D53E4F", "#9E0142", "#67001F", "#40004B")
# Combine multivariate outputs:
bind_rows(mod1$NMDS.scores, mod2$NMDS.scores, mod3$NMDS.scores,
mod4$NMDS.scores, mod5$NMDS.scores, mod6$NMDS.scores) %>%
pivot_wider(names_from = Layer,
values_from = c(NMDS1, NMDS2)) %>%
# Modify scales and factors
mutate(Month = format(as.Date(Date), format="%m")) %>%
mutate(Index_Type = ifelse(Index_Type == "RA_Index", "eDNA Index", Index_Type)) %>%
mutate(Index_Type = factor(Index_Type, levels = c("RA", "PA", "eDNA Index"))) %>%
mutate(Taxa = ifelse(Taxa %in% GenusList15, Taxa, "")) %>%
# Plot
ggplot(aes(NMDS1_Samples, NMDS2_Samples)) +
# Taxa are represented as text
geom_text(aes(NMDS1_Taxa, NMDS2_Taxa, label = Taxa),
size = 3, colour = "grey", fontface = "italic") +
# Dotted lines combine microscopy and DNA samples from the same date
geom_line(aes(color = Month, group = Date), linetype = "dotted") +
# Points represent each net.
geom_point(aes(shape = Method, color = Month), size = 2) +
# Eclipses for CI of the method (DNA vs Microscopy)
stat_ellipse(aes(linetype = Method), size = 0.2) +
# Facet Rows and Columns
facet_grid(Index_Type~Gene_Marker, scales = "free") +
# Esthetics
theme_minimal_grid(12) +
theme(aspect.ratio = 1,
panel.border = element_rect(colour = "black", fill=NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
scale_color_manual(values = seasonal_colors) +
scale_shape_manual(values=c(1, 16))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 138 rows containing non-finite outside the scale range
## (`stat_ellipse()`).
## Warning: Removed 248 rows containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 138 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 138 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggsave("Figure_output/Fig.3_NMDS.pdf", width = 8, height = 10)
## Warning: Removed 138 rows containing non-finite outside the scale range
## (`stat_ellipse()`).
## Warning: Removed 248 rows containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 138 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 138 rows containing missing values or values outside the scale range
## (`geom_point()`).
linear regressions between microscopy and molecular data were modelled, using the lm function.
Both 18S and COI combined, Relative abundance vs eDNA-index
Net_merged_data %>%
filter(Index_Type %in% c("RA_Index", "RA")) %>%
#filter(Molecular > 0, Microscopy > 0) %>%
ggplot(aes(Microscopy, Molecular)) +
geom_point() +
geom_smooth(aes(group = Taxa), color = "grey", method = lm, se=F) +
geom_smooth(color = "#984ea3", method = lm, se=F) +
theme_minimal_grid() +
facet_wrap("Index_Type", scales = "free") +
theme(aspect.ratio = 1,
panel.background = element_rect(colour = 'darkgrey')) +
ylim(0,1) +
xlim(0,1)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 103 rows containing missing values or values outside the scale range
## (`geom_smooth()`).
ggsave("Figure_output/Fig.S3AB_Regression.pdf", width = 7, height = 4)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 103 rows containing missing values or values outside the scale range
## (`geom_smooth()`).
Calculate Rsq, P and Slope for all regressions:
#Create new tables for regression summary and calculate coefficients
# Empty table
species_list <- data.frame(Taxa = unique(Net_merged_data$Taxa),
RA_Rsq = 0,
#I_Rsq = 0,
RA_p.val = 0,
#I_p.val = 0,
RA_B0 = 0,
I_B0 = 0,
RA_slope = 0,
I_slope = 0)
# Fill the table for each pairwise comparison:
for(i in 1:length(species_list$Taxa)){ #Start of loop
RA <- Net_merged_data %>%
filter(Taxa == species_list$Taxa[i],
Index_Type == "RA")
RA_I <- Net_merged_data %>%
filter(Molecular > 0, Microscopy > 0) %>%
filter(Taxa == species_list$Taxa[i],
Index_Type == "RA_Index")
# Calculate the metrics:
lm1 <- lm(sqrt(Microscopy) ~ sqrt(Molecular), data = RA)
lm2 <- lm(sqrt(Microscopy) ~ sqrt(Molecular), data = RA_I)
species_list$RA_Rsq[i] <- summary(lm1)$adj.r.squared
species_list$RA_p.val[i] <- summary(lm1)$coefficients[2,4]
species_list$RA_B0[i] <- lm1$coefficients[1]
species_list$RA_slope[i] <- lm1$coefficients[2]
#species_list$I_Rsq[i] <- summary(lm2)$adj.r.squared
#species_list$I_p.val[i] <- summary(lm2)$coefficients[2,4]
species_list$I_B0[i] <- lm2$coefficients[1]
species_list$I_slope[i] <- lm2$coefficients[2]
}
species_list %>%
write_csv("./Figure_output/Tab.2_Regressions.csv")
These taxa have significant correlations:
species_list %>%
filter(RA_p.val > 0.05) %>%
pull(Taxa)
## [1] "Aetideus" "Beroe" "Calliopius" "Candacia"
## [5] "Clione" "Cyphocaris" "Discoconchoecia" "Euphausia"
## [9] "Merluccius" "Microcalanus" "Munida" "Nanomia"
## [13] "Oikopleura" "Pandalus" "Paraeuchaeta" "Pasiphaea"
## [17] "Scina" "Thysanoessa" "Triconia"
Whuile correlation coefficients are the same using RA and RA-index, the slope will differ. Here we plot the difference in slope between the two standardisation methods.
species_list %>%
filter(RA_p.val<0.05) %>%
select(RA_slope, I_slope) %>%
pivot_longer(1:2) %>%
mutate(Method = factor(name, levels = c("RA_slope", "I_slope")),
`Slope (Molecular ~ Microscopy)` = value) %>%
ggplot(aes(Method, `Slope (Molecular ~ Microscopy)`)) +
geom_violin() +
geom_point() +
theme_minimal_hgrid() +
scale_y_log10() +
theme(aspect.ratio = 1,
panel.background = element_rect(colour = 'darkgrey'))
## Warning in transformation$transform(x): NaNs produced
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## Warning in transformation$transform(x): NaNs produced
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_ydensity()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
ggsave("./Figure_output/Fig.S3C_SlopeDifferences.pdf", width = 4, height = 5)
## Warning in transformation$transform(x): NaNs produced
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## Warning in transformation$transform(x): NaNs produced
## Warning in scale_y_log10(): log-10 transformation introduced infinite values.
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_ydensity()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
The aim of this section is to compare if seasonal patterns of
RA_index overlap between the different markers and methods. Zooplankton
net tows (Count, COI and 18S) that are depth integrated (only one sample
per sampling date) are plotted differently from the eDNA samples (that
has a depth resolution with several samples per sampling date).
Genus level comparison:
Plot seasonal trend of bulk zooplankton samples
# Combine the zooplankton datasets at genus level
tmp <- bind_rows(mutate(Net_count_2017_index_genus, marker = "Count"),
mutate(Net_COI_2017_index_genus, marker = "COI"),
mutate(Net_18S_2017_index_genus, marker = "18S"))
# Arrange the taxa according to there total annual biomass
order <- tmp %>%
group_by(Taxa, marker) %>%
summarize(Abundance = sum(Abundance)) %>%
pivot_wider(names_from = "marker", values_from = "Abundance") %>%
arrange(-Count) %>%
pull(Taxa)
## `summarise()` has grouped output by 'Taxa'. You can override using the
## `.groups` argument.
tmp %>%
# Filter taxa (all) and order factors
filter(Taxa %in% GenusList) %>%
mutate(marker = factor(marker, levels = c("Count", "COI", "18S"))) %>%
mutate(Taxa = factor(Taxa, levels = c(order))) %>%
# Plot
ggplot() +
geom_line(aes(Date, RA_Index, color = marker, linetype = marker)) +
facet_wrap(facets = "Taxa", ncol = 4) +
# Esthetics
theme_minimal_hgrid() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
strip.text = element_text(face = "italic"),
legend.position="none",
panel.background = element_rect(fill = '#F5F5F5', colour = 'darkgrey'),
aspect.ratio = 1/2.5) +
scale_color_manual(values = c('#fec44f','#4daf4a','#377eb8'))
ggsave("Figure_output/Fig.5_ZooplanktonSeasonality.pdf", width = 8, height = 10)
RA_index of eDNA samples, the average of 18S and COI.
Plot verical distribution of zooplankton over the season.
# Combine and merge COI and 18S eDNA components. Calculate mean of COI and 18S
eDNA <- bind_rows(mutate(eDNA_COI_2017_index_genus, marker = "COI"),
mutate(eDNA_18S_2017_index_genus, marker = "18S")) %>%
group_by(Taxa, Date, Depth) %>%
summarize(RA_Index = mean(RA_Index),
Abundance = mean(Abundance),
marker = "eDNA")
## `summarise()` has grouped output by 'Taxa', 'Date'. You can override using the
## `.groups` argument.
# Combine eDNA data with Net data components.
data <- eDNA %>%
bind_rows(mutate(Net_count_2017_index_genus, marker = "Count"),
mutate(Net_COI_2017_index_genus, marker = "COI"),
mutate(Net_18S_2017_index_genus, marker = "18S")) %>%
filter(Taxa %in% GenusList,
Taxa %in% pull(eDNA, Taxa)) %>%
mutate(Depth = as.character(Depth)) %>%
# Modify the parameter shown on the categorical Y axis. That is depth for eDNA samples and Marker for Net samples".
mutate(y = ifelse(marker == "eDNA", Depth, marker)) %>%
mutate(y = factor(y, levels = rev(c("Count", "COI", "18S", "0", "5", "30", "100", "260")))) %>%
mutate(Taxa = factor(Taxa, levels = order))
data %>% ggplot() +
# Add filter water data:
geom_point(
mapping = aes(Date, y, color = RA_Index),
#data = filter(data, y %!in% c("Count", "COI", "18S")),
shape = 15) +
scale_color_gradient2(
high = '#984ea3', low = "white") +
# Add Microscopy data
new_scale_color() +
geom_point(
mapping = aes(Date, y, color = RA_Index),
data = filter(data, y %in% c("Count")),
shape = 15) +
scale_color_gradient2(high = '#fec44f',
low = "white") +
# Add Net COI data:
new_scale_color() +
geom_point(
aes(Date, y, color = RA_Index),
filter(data, y %in% c("COI")),
shape = 15) +
scale_color_gradient2(high = '#4daf4a',
low = "white") +
# Add Net 18S data:
new_scale_color() +
geom_point(
aes(Date, y, color = RA_Index),
filter(data, y %in% c("18S")),
shape = 15) +
scale_color_gradient2(high = '#377eb8',
low = "white") +
# Set plot themes
#theme_map() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
strip.text = element_text(face = "italic"),
# legend.position="none",
# panel.background = element_rect(fill = '#F5F5F5', colour = 'darkgrey'),
aspect.rtio = 1/2.5) +
facet_wrap("Taxa", ncol = 3)
## Warning in plot_theme(plot): The `aspect.rtio` theme element is not defined in
## the element hierarchy.
ggsave("Figure_output/Fig.6_VertivalSeasonCom.pdf", width = 7, height = 10)
## Warning in plot_theme(plot): The `aspect.rtio` theme element is not defined in the element hierarchy.
## The `aspect.rtio` theme element is not defined in the element hierarchy.
The robustness of the eDNA index transformation can be visualized by repeating the trends at species level, and also including other sample techniques.
# Combining datasets
tmp <- bind_rows(mutate(Net_count_2017_index_species, marker = "Count"),
mutate(Net_COI_2017_index_species, marker = "COI_bulk"),
mutate(EtOH_COI_2017_index_species, marker = "COI_sup"))
# Order by biomass
order <- tmp %>%
group_by(Taxa, marker) %>%
summarize(Abundance = sum(Abundance)) %>%
pivot_wider(names_from = "marker", values_from = "Abundance") %>%
arrange(-Count) %>%
pull(Taxa)
## `summarise()` has grouped output by 'Taxa'. You can override using the
## `.groups` argument.
# Arrange factors and plot.
tmp %>%
filter(Taxa %in% SpeciesList) %>%
mutate(marker = factor(marker, levels = c("Count", "COI_bulk", "COI_sup"))) %>%
mutate(Taxa = factor(Taxa, levels = order)) %>%
ggplot() +
geom_line(aes(Date, RA_Index, color = marker, linetype = marker)) +
facet_wrap(facets = "Taxa", ncol = 4) +
# Esthetics
theme_minimal_hgrid(8) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
strip.text = element_text(face = "italic"),
#legend.position="none",
panel.background = element_rect(fill = '#F5F5F5', colour = 'darkgrey'),
aspect.ratio = 1/2.5,
legend.position = "none") +
scale_color_manual(values = c('#fec44f','#4daf4a',"cyan"))
ggsave("Figure_output/Supp.Fig.S5_SeasonalBulkSpecies.pdf", width = 10, height = 7)
The ammount of sequences per taxa can hint at the reproducibility of the seasonal trends of the zooplankton. Species with low data may have less reliable results.
# Combine data sets
bind_rows(
mutate(Net_count_2017_index_genus, marker = "Microscopy"),
mutate(Net_COI_2017_index_genus, marker = "COI"),
mutate(Net_18S_2017_index_genus, marker = "18S")) %>%
filter(Taxa %in% GenusList) %>%
select(Taxa, Date, Abundance, marker) %>%
group_by(Taxa, marker) %>%
summarise(Abundance = sum(Abundance)) %>%
pivot_wider(names_from = "marker", values_from = "Abundance") %>%
arrange(-Microscopy) %>%
write_csv("Figure_output/Suppl.Tab.S4_DataAmmount.csv")
## `summarise()` has grouped output by 'Taxa'. You can override using the
## `.groups` argument.
These packages and versions were used in this script.
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] vegan_2.6-4 lattice_0.22-5 permute_0.9-7 ggnewscale_0.5.0
## [5] patchwork_1.3.0 ggpubr_0.6.0 cowplot_1.1.3 ggspatial_1.1.9
## [9] ggOceanMaps_2.2.0 eulerr_7.0.0 UpSetR_1.4.0 lubridate_1.9.3
## [13] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
## [17] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
## [21] tidyverse_2.0.0 readxl_1.4.3
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-164 sf_1.0-15 bit64_4.0.5 tools_4.2.1
## [5] backports_1.4.1 bslib_0.7.0 utf8_1.2.4 R6_2.5.1
## [9] KernSmooth_2.23-22 DBI_1.2.1 mgcv_1.9-1 colorspace_2.1-0
## [13] withr_3.0.0 tidyselect_1.2.0 gridExtra_2.3 bit_4.0.5
## [17] compiler_4.2.1 textshaping_0.3.7 cli_3.6.2 labeling_0.4.3
## [21] sass_0.4.9 scales_1.3.0 classInt_0.4-10 proxy_0.4-27
## [25] systemfonts_1.0.5 digest_0.6.35 rmarkdown_2.26 pkgconfig_2.0.3
## [29] htmltools_0.5.8.1 fastmap_1.1.1 highr_0.10 rlang_1.1.3
## [33] rstudioapi_0.16.0 jquerylib_0.1.4 farver_2.1.2 generics_0.1.3
## [37] jsonlite_1.8.8 vroom_1.6.5 car_3.1-2 magrittr_2.0.3
## [41] Matrix_1.6-5 Rcpp_1.0.12 munsell_0.5.1 fansi_1.0.6
## [45] abind_1.4-5 lifecycle_1.0.4 stringi_1.8.4 yaml_2.3.8
## [49] carData_3.0-5 MASS_7.3-60.0.1 plyr_1.8.9 grid_4.2.1
## [53] parallel_4.2.1 crayon_1.5.2 stars_0.6-4 splines_4.2.1
## [57] hms_1.1.3 polylabelr_0.2.0 knitr_1.46 pillar_1.9.0
## [61] ggsignif_0.6.4 glue_1.7.0 evaluate_0.23 vctrs_0.6.5
## [65] tzdb_0.4.0 cellranger_1.1.0 polyclip_1.10-6 gtable_0.3.5
## [69] cachem_1.0.8 xfun_0.43 broom_1.0.5 e1071_1.7-14
## [73] rstatix_0.7.2 ragg_1.2.7 class_7.3-22 units_0.8-5
## [77] cluster_2.1.6 timechange_0.3.0